210 research outputs found

    Generative modelling and adversarial learning

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.A main goal of statistics and machine learning is to represent and manipulate high-dimensional probability distributions of real-world data, such as natural images. Generative adversarial networks (GAN), which are based on the adversarial learning paradigm, are one of the main types of methods for deriving generative models from complicated real-world data. GAN and its variants use a generator to synthesise semantic data from standard signal distributions and train a discriminator to distinguish real samples in the training dataset from fake samples synthesised by the generator. As a confronter, the generator aims to deceive the discriminator by producing ever more realistic samples. Through a two-player adversarial game played by the generator and discriminator, the generated distribution can approximate the real-world distribution and generate samples from it. This thesis aims to both improve the quality of generative modelling and manipulate generated samples by specifying multiple scene properties. A novel framework for training GAN is proposed to stabilise the training process and produce more realistic samples. Unlike existing GANs, which alternately train a generator and a discriminator using a pre-defined adversarial objective function, different adversarial training objectives are utilised as mutation operations and train a population of generators to adapt to the environment (i.e. the discriminator). The samples generated by different iterations of generators are evaluated and only well-performing generators are preserved and used for further training. In this way, the proposed framework overcomes the limitations of an individual adversarial training objective and always preserves the best offspring, contributing to the progress and success of GANs. Based on the GANs framework, this thesis devised a novel model, called a perceptual adversarial network (PAN). The proposed PAN consists of two feed-forward convolutional neural networks: a transformation network and a discriminative network. Besides generative adversarial loss, which is widely used in GANs, this thesis proposes to employ perceptual adversarial loss, which undergoes adversarial training between the transformation network and hidden layers of the discriminative network. The hidden layers and output of the discriminative network are upgraded to constantly and automatically discover discrepancies between a transformed image and the corresponding ground truth, and the image transformation network is trained to minimise the discrepancy identified by the discriminative network. Furthermore, to extend the generative models to perform more challenging re-rendering tasks, this thesis explores disentangled representations encoded in real-world samples and proposes a principled tag disentangled generative adversarial network for re-rendering new samples of the object of interest from a single image by specifying multiple scene properties. Specifically, from an input sample, a disentangling network extracts disentangled and interpretable representations, which are then used to generate new samples using the generative network. In order to improve the quality of the disentangled representations, a tag mapping net determines the consistency between the image and its tags. Finally, experiments with different challenging datasets and image synthesis tasks demonstrate the good performance of the proposed frameworks regarding the problem of interest

    A CRF Sequence Labeling Approach to Chinese Punctuation Prediction

    Get PDF

    A User-Centered Concept Mining System for Query and Document Understanding at Tencent

    Full text link
    Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201

    Cross-Modal Contrastive Learning for Robust Reasoning in VQA

    Full text link
    Multi-modal reasoning in visual question answering (VQA) has witnessed rapid progress recently. However, most reasoning models heavily rely on shortcuts learned from training data, which prevents their usage in challenging real-world scenarios. In this paper, we propose a simple but effective cross-modal contrastive learning strategy to get rid of the shortcut reasoning caused by imbalanced annotations and improve the overall performance. Different from existing contrastive learning with complex negative categories on coarse (Image, Question, Answer) triplet level, we leverage the correspondences between the language and image modalities to perform finer-grained cross-modal contrastive learning. We treat each Question-Answer (QA) pair as a whole, and differentiate between images that conform with it and those against it. To alleviate the issue of sampling bias, we further build connected graphs among images. For each positive pair, we regard the images from different graphs as negative samples and deduct the version of multi-positive contrastive learning. To our best knowledge, it is the first paper that reveals a general contrastive learning strategy without delicate hand-craft rules can contribute to robust VQA reasoning. Experiments on several mainstream VQA datasets demonstrate our superiority compared to the state of the arts. Code is available at \url{https://github.com/qizhust/cmcl_vqa_pl}

    MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models

    Full text link
    The advent of open-source AI communities has produced a cornucopia of powerful text-guided diffusion models that are trained on various datasets. While few explorations have been conducted on ensembling such models to combine their strengths. In this work, we propose a simple yet effective method called Saliency-aware Noise Blending (SNB) that can empower the fused text-guided diffusion models to achieve more controllable generation. Specifically, we experimentally find that the responses of classifier-free guidance are highly related to the saliency of generated images. Thus we propose to trust different models in their areas of expertise by blending the predicted noises of two diffusion models in a saliency-aware manner. SNB is training-free and can be completed within a DDIM sampling process. Additionally, it can automatically align the semantics of two noise spaces without requiring additional annotations such as masks. Extensive experiments show the impressive effectiveness of SNB in various applications. Project page is available at https://magicfusion.github.io/

    FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

    Full text link
    Data-Efficient GANs (DE-GANs), which aim to learn generative models with a limited amount of training data, encounter several challenges for generating high-quality samples. Since data augmentation strategies have largely alleviated the training instability, how to further improve the generative performance of DE-GANs becomes a hotspot. Recently, contrastive learning has shown the great potential of increasing the synthesis quality of DE-GANs, yet related principles are not well explored. In this paper, we revisit and compare different contrastive learning strategies in DE-GANs, and identify (i) the current bottleneck of generative performance is the discontinuity of latent space; (ii) compared to other contrastive learning strategies, Instance-perturbation works towards latent space continuity, which brings the major improvement to DE-GANs. Based on these observations, we propose FakeCLR, which only applies contrastive learning on perturbed fake samples, and devises three related training techniques: Noise-related Latent Augmentation, Diversity-aware Queue, and Forgetting Factor of Queue. Our experimental results manifest the new state of the arts on both few-shot generation and limited-data generation. On multiple datasets, FakeCLR acquires more than 15% FID improvement compared to existing DE-GANs. Code is available at https://github.com/iceli1007/FakeCLR.Comment: Accepted by ECCV202
    corecore